perm filename MACHR[4,KMC]3 blob sn#012461 filedate 1972-11-13 generic text, type T, neo UTF8
00100	COLBY AND MORAVEC
00200	
00300	
00400	CONTEXT-SENSITIVE  FEATURE  RECOGNITION FOR COMPUTER UNDERSTANDING OF
00500	TELETYPED NATURAL LANGUAGE DIALOGUES
00600	
00700	
00800	WHY  IS  IT SO DIFFICULT FOR MACHINES TO UNDERSTAND NATURAL LANGUAGE?
00900	PERHAPS IT IS BECAUSE MACHINES  DO  NOT  SIMULATE  SUFFICIENTLY  WHAT
01000	HUMANS  DO  WHEN HUMANS PROCESS LANGUAGE. SEVERAL YEARS OF EXPERIENCE
01100	WITH COMPUTER SCIENCE AND LINGUISTIC APPROACHES HAVE  TAUGHT  US  THE
01200	SCOPE   AND  LIMITATIONS  OF  SYNTACTICAL,  SEMANTIC  AND  CONCEPTUAL
01300	PARSING.[THORNE            &            BRATLEY]            [SIMMONS]
01400	[SCHANK][WILKS][WOODS][WINOGRAD].   WHILE   CURRENT  PARSERS  PERFORM
01500	SATISFACTORILY  WITH  CAREFULLY  EDITED  TEXT   SENTENCES   OR   WITH
01600	EXPRESSIONS  LIMITED  TO  A  TOY  WORLD, THEY ARE UNABLE TO DEAL WITH
01700	EVERYDAY LANGUAUGE BEHAVIOR CHARACTERISTIC OF HUMAN CONVERSATION.  IN
01800	AN  UNDERSTANDBLY  RATIONALISTIC QUEST FOR CERTAINTY AND ATTRACTED BY
01900	AN ANALOGY FROM THE PROOF THEORY OF LOGICIANS  IN  WHICH  PROVABILITY
02000	IMPLIED  COMPUTABILITY,  COMPUTATIONAL  LINGUISTS  HOPED  TO  DEVELOP
02100	CONTEXT-FREE FORMALISMS FOR NATURAL LANGUAGE GRAMMARS. BUT THE HOPE HAS  NOT
02200	BEEN  REALIZED AND PERHAPS IN PRINCIPLE CANNOT BE.   (IT IS DIFFICULT
02300	TO FORMALIZE SOMETHING WHICH CAN  HARDLY  BE  FORMULATED).  IN  THEIR
02400	DIALOGUES   HUMANS   ARE   NEVER   CONTEXT-FREE   LINGUISTICALLY   OR
02500	CONCEPTUALLY.     THE   MAIN   PROBLEM   IS   HOW   TO   MODEL   THIS
02600	CONTEXT-SENSITIVITY.
02700	
02800	LINGUISTIC PARSERS USE  MORPHOGRAPHEMIC  ANALYSES  ,  PARTS-OF-SPEECH
02900	ASSIGNMENTS  AND  DICTIONARIES  CONTAINING  MULTIPLE WORD-SENSES EACH
03000	POSSESSING SEMANTIC FEATURES FOR RESTRICTING WORD COMBINATIONS.  SUCH
03100	PARSERS  PERFORM  A  WORD-BY-WORD  ANALYSIS  OF EVERY WORD, VALIANTLY
03200	DISAMBIGUATING AT EACH STEP IN AN ATTEMPT TO CONSTRUCT  A  MEANINGFUL
03300	INTERPRETATION.  WHILE  IT  MAY  BE  SOPHISTICATED COMPUTATIONALLY, A
03400	LINGUISTIC PARSER IS QUITE USELESS FOR THE UNDERSTANDING OF  ORDINARY
03500	CONVERSATION.  IN  EVERYDAY  DISCOURSE  PEOPLE SPEAK COLLOQUIALLY AND
03600	IDIOMATICALLY USING ALL SORTS OF PAT PHRASES (`YOU SAID  IT'),  SLANG
03700	(`LETS RAP') AND CLICHES (`THATS THE WAY IT GOES').  THEY ARE CRYPTIC
03800	AND ELLIPTIC.  THEY LACE THEIR EVEN THEIR  WRITTEN  EXPRESSIONS  WITH
03900	MEANINGLESS  FUZZ  (`WELL NOW LETS SEE') AND FRAGMENTS(`REALLY').THEY
04000	CONVEY  THEIR  INTENTIONS  AND  IDEAS  IN  BOTH   IDIOSYNCRATIC   AND
04100	METAPHORICAL  WAYS, BLITHELY VIOLATING RULES OF 'CORRECT' GRAMMAR AND
04200	SYNTAX.      GIVEN THESE DIFFICULTIES, HOW IS IT THAT PEOPLE CARRY ON
04300	CONVERSATIONS  EASILY  MOST  OF THE TIME WHILE MACHINES HAVE FOUND IT
04400	EXTREMELY DIFFICULT TO  CONTINUE  TO  MAKE  CONCEPTUALLY  APPROPRIATE
04500	REPLIES WHICH COMMUNICATE UNDERSTANDING.
04600	
04700	
04800	
04900	IT SEEMS THAT PEOPLE 'GET THE MESSAGE' WITHOUT ANALYZING EVERY SINGLE
05000	WORD  IN  THE  INPUT AND EVEN IGNORING MANY OF ITS TERMS. PEOPLE MAKE
05100	INDIVIDUALISTIC SELECTIONS  FROM  HIGHLY  REDUNDANT  AND  REPETITIOUS
05200	COMMUNICATIONS.    THESE HIGHLY PERSONAL SELECTIVE OPERATIONS PRODUCE
05300	A TRANSFORMATION OF THE  INPUT  BY  DESTROYING  AND  EVEN  DISTORTING
05400	INFORMATION.  IN  SPEED READING, FOR EXAMPLE, ONLY A SMALL PERCENTAGE
05500	OF CONTENTIVE WORDS ON EACH PAGE  NEED  BE  LOOKED  AT.  THESE  WORDS
05600	SOMEHOW  RESONATE  WITH  THE  READERS RELEVANT CONCEPTUAL-INFERENTIAL
05700	STRUCTURE WHOSE PROCESSES ENABLE HIM TO 'UNDERSTAND' NOT  SIMPLY  THE
05800	LANGUAGE  BUT  ALL  SORTS OF UNMENTIONED ASPECTS ABOUT THE SITUATIONS
05900	AND EVENTS BEING REFERRED TO IN THE LANGUAGE.    IN WRITTEN TEXTS 5/6
06000	OF THE INPUT CAN BE DISTORTED OR DELETED AND THE INTENDED MESSAGE CAN
06100	STILL SUCCESSFULLY BE EXTRACTED. SPOKEN CONVERSATIONS IN ENGLISH  ARE
06200	KNOWN  TO  BE  AT LEAST 50% REDUNDANT.  HALF THE WORDS CAN BE GARBLED
06300	AND LISTENERS NONETHELESS GET THE GIST OR  DRIFT  OF  WHAT  IS  BEING
06400	SAID.   (GIVE FURTHER EXPERIMENTAL EVIDENCE HERE)
06500	
06600	TO  APPROXIMATE  SUCH HUMAN ACHIEVEMENTS WE REQUIRE A NEW PERSPECTIVE
06700	AND A PRACTICAL METHOD WHICH DIFFERS FROM THAT OF CURRENT  LINGUISTIC
06800	PARSING.       THIS  ALTERNATE  APPROACH SHOULD INCORPORATE KNOWLEDGE
06900	GAINED  FROM  WORK  WITH  PARSERS  BUT   SHOULD   UTILIZE   PRIMARILY
07000	INDIVIDUALISTIC-CONCEPTUAL RATHER THAN GENERAL- GRAMMATICAL FEATURES.
07100	PARSERS REPRESENT COMPLEX AND REFINED ALGORITHMS.  WHILE ON ONE  HAND
07200	THEY  SUBJECT  A  SENTENCE  TO  A  DETAILED AND SOMETIMES OVERKILLING
07300	ANALYSIS, ON THE OTHER  THEY  ARE  FINICKY  AND  OVERSENSITIVE.   FOR
07400	EXAMPLE,  A  LINGUISTIC  PARSER  SIMPLY  HALTS IF A WORD IN THE INPUT
07500	SENTENCE IS NOT PRESENT IN ITS DICTIONARY. UNGRAMMATICAL  EXPRESSIONS
07600	SUCH  AS  DOUBLE  PREPOSITIONS  (`DO  YOU WANT TO GET OUT OF FROM THE
07700	HOSPITAL?')  ARE  QUITE  CONFUSING  TO  THEM.  PARSERS CONSTITUTE   A
07800	TIGHT CONJUNCTION OF TESTS RATHER THAN A LOOSE DISJUNCTION WHICH PERMITS                                    
07810	PLAUSIBLE GUESSING AND MISUNDERSTANDING. THUS AS MORE AND MORE
07900	TESTS ARE ADDED ,THE PARSER BEHAVES LIKE A FINER AND FINER FILTER AND
08000	IT BECOMES HARDER AND HARDER FOR AN EXPRESSION TO PASS THROUGH IT.
08100	
08200	ON INTUITIVE GROUNDS IT IS HARDLY CREDIBLE THAT CONVENTIONAL  PARSERS
08300	MODEL  THE MECHANISMS PEOPLE USE IN PROCESSING LANGUAGE.  AS CHOMSKY[
08400	] HAS  REMARKED,  `WE  NOTED  AT  THE  OUTSET  THAT  PERFORMANCE  AND
08500	COMPETENCE  MUST  BE  SHARPLY DISTINGUIHED IF EITHER IS TO BE STUDIED
08600	SUCCESSFULLY. WE HAVE NOW DESCRIBED A CERTAIN MODEL OF COMPETENCE. IT
08700	WOULD  BE  TEMPTING,  BUT  QUITE  ABSURD,  TO REGARD IT AS A MODEL OF
08800	PERFORMANCE AS WELL.   THUS  WE  MIGHT  PROPOSE  THAT  TO  PRODUCE  A
08900	SENTENCE   THE   SPEAKER   GOES   THROUGH  THE  SUCCESSIVE  STEPS  OF
09000	CONSTRUCTING A BASE-DERIVATION, LINE BY LINE FROM THE INITIAL  SYMBOL
09100	S,   THEN   INSERTING   LEXICAL   ITEMS   AND   APPLYING  GRAMMATICAL
09200	TRANSFORMATIONS TO FORM A SURFACE STRUCTURE, AND FINALLY APPLYING THE
09300	PHONOLOGICAL  RULES  IN  THEIR  GIVEN  ORDER,  IN ACCORDANCE WITH THE
09400	CYCLIC PRINCIPLE  DISCUSSED  ABOVE.    THERE  IS  NOT  THE  SLIGHTEST
09500	JUSTIFICATION FOR ANY SUCH ASSUMPTION.' IT SHOULD BE CLEAR FROM THESE
09600	STRICTURES THAT THE TRANSFORMATIONAL APPROACH HAS BEEN CONCERNED WITH
09700	PRODUCTION RATHER THAN INTERPRETATION OF SENTENCES AND THAT IT IS NOT
09800	ORIENTED TOWARDS HUMAN PERFORMANCE BUT TOWARDS AN  IDEALIZED  GRAMMAR
09900	OF COMPETENCE.
10000	
10100	EARLY  ATTEMPTS  TO  DEVELOP  A  FEATURE-RECOGNITION  APPROACH  USING
10200	SPECIAL-PURPOSE HEURISTICS HAVE BEEN DESCRIBED  BY  COLBY,  WATT  AND
10300	GILBERT  [ ], WEIZENBAUM[ ] AND COLBY AND ENEA[ ]. THE LIMITATIONS OF
10400	THESE ATTEMPTS ARE WELL KNOWN TO WORKERS IN ARTIFICIAL  INTELLIGENCE.
10500	SUCH  PRIMITIVE  CONTEXT-RESTRICTED PROGRAMS OFTEN GRASP A TOPIC WELL
10600	ENOUGH BUT TOO OFTEN DO NOT UNDERSTAND QUITE WHAT IS BEING SAID ABOUT
10700	THE  TOPIC, WITH AMUSING OR DISASTROUS CONSEQUENCES. THIS SHORTCOMING
10800	IS BOTH LINGUISTIC AND CONCEPTUAL IN THAT  THE  FEATURE-  RECOGNITION
10900	ABILITIES OF SUCH PROGRAMS ARE RUDIMENTARY AND SINCE THEY LACK A RICH
11000	CONCEPTUAL STRUCTURE INTO WHICH THE PATTERN ABSTRACTED FROM THE INPUT
11100	CAN  BE  MATCHED  FOR  FURTHER  INFERENCING.   IN  OUR EXPERIENCE THE
11200	MAN-MACHINE  CONVERSATIONS  SOON  BECAME  IMPOVERISHED  AND   BORING.
11300	WINOGRAD`S PROGRAM ,WHILE LIMITED TO A FEW OBJECTS AND RELATIONS IN A
11400	TOY  ROBOTIC   WORLD,REPRESENTED   A   GREAT   IMPROVEMENT   IN   THE
11500	FEATURE-RECOGNITION  APPROACH.   HOWEVER MANY OF HIS FEATURES,SUCH AS
11600	DETERMINERS  AND  NOUN  GROUPS,  WERE   GRAMMATICALLY   RATHER   THAN
11700	CONCEPTUALLY  ORIENTED. ANOTHER FEATURE-RECOGNITUION APPROACH IS THAT
11800	OF WILKS[ ] WORKING IN THE AREA OF MACHINE TRANSLATION. HIS ALGORITHM
11900	CONSTRUCTS A PATTERN FROM ENGLISH TEXT INPUT WHICH IS MATCHED AGAINST
12000	TEMPLATES IN AN INTERLINGUAL DATA BASE  FROM  WHICH,IN  TURN,  FRENCH
12100	OUTPUT IS GENERATED WITHOUT USING A GENERATIVE GRAMMAR.
12200	
12300	IN  THE  COURSE  OF CONSTRUCTING A COMPUTER SIMULATION OF PARANOIA WE
12400	WERE FACED WITH THE PROBLEM OF DEALING WITH NATURAL LANGUAGE AS IT IS
12500	USED  IN THE DOCTOR-PATIENT SITUATION OF A PSYCHIATRIC INTERVIEW.THIS
12600	DOMAIN OF  DISCOURSE  ADMITTEDLY  CONTAINS  MANY  STEREOTYPES  (`WHAT
12700	BROUGHT YOU TO THE HOSPITAL?') AND IS CONSTRAINED IN TOPICS (NEWTON`S
12800	LAWS ARE RARELY DISCUSSED). BUT IT IS RICH ENOUGH IN VERBAL  BEHAVIOR
12900	TO BE A CHALLENGE TO A LANGUAGE UNDERSTANDING ALGORITHM SINCE A GREAT
13000	VARIETY OF HUMAN EXPERIENCES ARE DISCUSSED IN THIS  DOMAIN  INCLUDING
13100	THE  RELATION WHICH DEVELOPS BETWEEN THE INTERVIEW PARTICIPANTS.  THE
13200	JUDGEMENT OF 'PARANOIA' IS MADE BY PSYCHIATRISTS  RELYING  MAINLY  ON
13300	THE  VERBAL BEHAVIOR OF THE INTERVIEWED PATIENT.  IF A PARANOID MODEL
13400	IS TO EXHIBIT PARANOID BEHAVIOR IN A PSYCHIATRIC INTERVIEW,  IT  MUST
13500	BE  CAPABLE  OF  HANDLING  DIALOGUES  TYPICAL  OF  THE DOCTOR-PATIENT
13600	CONTEXT.     SINCE THE MODEL CAN COMMUNICATE ONLY  THROUGH  TELETYPED
13700	MESSAGES,THE VIS-A-VIS ASPECTS OF THE USUAL PSYCHIATRIC INTERVIEW ARE
13800	ABSENT.  THUS THE MODEL SHOULD  BE  ABLE  TO  DEAL  WITH  TYPEWRITTEN
13900	NATURAL  LANGUAGE INPUT AND TO OUTPUT REPLIES WHICH ARE INDICATIVE OF
14000	AN UNDERLYING PARANOID THOUGHT PROCESS.
14100	
14200	IN A PSYCHIATRIC INTERVIEW THERE IS ALWAYS A WHO SAYING SOMETHING  TO
14300	A  WHOM  WITH  DEFINITE  INTENTIONS AND EXPECTATIONS.   THERE ARE TWO
14400	SITUATIONS TO BE TAKEN INTO ACCOUNT, THE ONE BEING TALKED  ABOUT  AND
14500	THE  ONE  THE  PARTICIPANTS  ARE IN. SOMETIMES THE LATTER BECOMES THE
14600	FORMER.    AS WEIZENBAUM [ ] HAS EMPHASIZED FOR COMPUTER  SCIENTISTS,
14700	PARTICIPANTS  IN  DIALOGUES HAVE PURPOSES AND MACHINES MUST RECOGNIZE
14800	THIS FACT.   THE DOCTOR'S PURPOSE  IS  TO  GATHER  CERTAIN  KINDS  OF
14900	INFORMATION  WHILE  THE  PATIENT'S PURPOSE IS TO GIVE INFORMATION AND
15000	GET HELP.  A JOB IS TO BE DONE; IT IS NOT SMALL  TALK.   OUR  WORKING
15100	HYPOTHESIS  IS  THAT EACH PARTICIPANT IN THE DIALOGUE UNDERSTANDS THE
15200	OTHER BY MATCHING SELECTED PERSONALLY- SIGNIFICANT  FEATURES  IN  THE
15300	INPUT AGAINST CONCEPTUAL PATTERNS WHICH CONTAIN INFORMATION ABOUT THE
15400	SITUATION  OR  EVENT  BEING  DESCRIBED  LINGUISTICALLY.          THIS
15500	UNDERSTANDING  IS  COMMUNICATED  RECIPROCALLY BY LINGUISTIC RESPONSES
15600	JUDGED  APPROPRIATE  TO  THE  INTENTIONS  AND  EXPECTATIONS  OF   THE
15700	PARTICIPANTS  AND TO THE REQUIREMENTS OF THE SITUATION. IN THIS PAPER
15800	WE SHALL  DESCRIBE  ONLY  THE  CONTEXT-SENSITIVE  FEATURE-RECOGNITION
15900	PROCESSES  USED TO EXTRACT A PATTERN FROM NATURAL LANGUAGE INPUT.IN A
16000	LATER COMMUNICATION  WE  SHALL  DESCRIBE  THE  INFERENTIAL  PROCESSES
16100	CARRIED  OUT  AT THE CONCEPTUAL LEVEL ONCE THE `PARADIGMATIC' PATTERN
16200	HAS BEEN RECEIVED FROM THE FEATURE-RECOGNITION PROCESSES.
16300	
16400	
16500	(HANS WRITES DESCRIPTION OF HIS FEATURE RECOGNIZER)